Tensorflow Project

Let's wrap up this Deep Learning by taking a a quick look at the effectiveness of Neural Nets!

We'll use the Bank Authentication Data Set from the UCI repository.

The data consists of 5 columns:

variance of Wavelet Transformed image (continuous)
skewness of Wavelet Transformed image (continuous)
curtosis of Wavelet Transformed image (continuous)
entropy of image (continuous)
class (integer)

Where class indicates whether or not a Bank Note was authentic.

This sort of task is perfectly suited for Neural Networks and Deep Learning! Just follow the instructions below to get started!

Get the Data

Use pandas to read in the bank_note_data.csv file



In [1]:



In [3]:

Check the head of the Data



In [61]:

EDA

We'll just do a few quick plots of the data.

Import seaborn and set matplolib inline for viewing



In [67]:

Create a Countplot of the Classes (Authentic 1 vs Fake 0)



In [68]:









    Out[68]:





<matplotlib.axes._subplots.AxesSubplot at 0x130bde4a8>

Create a PairPlot of the Data with Seaborn, set Hue to Class



In [69]:









    Out[69]:





<seaborn.axisgrid.PairGrid at 0x1313429e8>

Data Preparation

When using Neural Network and Deep Learning based systems, it is usually a good idea to Standardize your data, this step isn't actually necessary for our particular data set, but let's run through it for practice!

Standard Scaling

Import StandardScaler() from SciKit Learn



In [71]:

Create a StandardScaler() object called scaler.



In [72]:

Fit scaler to the features.



In [73]:









    Out[73]:





StandardScaler(copy=True, with_mean=True, with_std=True)

Use the .transform() method to transform the features to a scaled version.



In [74]:

Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.



In [77]:

Train Test Split

Create two objects X and y which are the scaled feature values and labels respectively.



In [79]:



In [80]:

Use the .as_matrix() method on X and Y and reset them equal to this result. We need to do this in order for TensorFlow to accept the data in Numpy array form instead of a pandas series.



In [81]:

Use SciKit Learn to create training and testing sets of the data as we've done in previous lectures:



In [45]:



In [46]:

Contrib.learn

Import tensorflow.contrib.learn.python.learn as learn



In [82]:

Create an object called classifier which is a DNNClassifier from learn. Set it to have 2 classes and a [10,20,10] hidden unit layer structure:



In [83]:

Now fit classifier to the training data. Use steps=200 with a batch_size of 20. You can play around with these values if you want!

Note: Ignore any warnings you get, they won't effect your output



In [94]:









    



/Users/marci/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1197: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  result_shape.insert(dim, 1)






    Out[94]:





DNNClassifier()

Model Evaluation

Use the predict method from the classifier model to create predictions from X_test



In [95]:

Now create a classification report and a Confusion Matrix. Does anything stand out to you?



In [96]:



In [97]:



In [98]:









    



             precision    recall  f1-score   support

          0       1.00      1.00      1.00       237
          1       1.00      0.99      1.00       175

avg / total       1.00      1.00      1.00       412

Optional Comparison

You should have noticed extremely accurate results from the DNN model. Let's compare this to a Random Forest Classifier for a reality check!

Use SciKit Learn to Create a Random Forest Classifier and compare the confusion matrix and classification report to the DNN model



In [99]:



In [100]:



In [101]:









    Out[101]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=200, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [102]:



In [103]:









    



             precision    recall  f1-score   support

          0       1.00      0.98      0.99       237
          1       0.98      0.99      0.99       175

avg / total       0.99      0.99      0.99       412



In [104]:

It should have also done very well, but not quite as good as the DNN model. Hopefully you have seen the power of DNN!

	Image.Var	Image.Skew	Image.Curt	Entropy
0	3.62160	8.6661	-2.8073	-0.44699
1	4.54590	8.1674	-2.4586	-1.46210
2	3.86600	-2.6383	1.9242	0.10645
3	3.45660	9.5228	-4.0112	-3.59440
4	0.32924	-4.4552	4.5718	-0.98880

	Image.Var	Image.Skew	Image.Curt	Entropy
0	1.121806	1.149455	-0.975970	0.354561
1	1.447066	1.064453	-0.895036	-0.128767
2	1.207810	-0.777352	0.122218	0.618073
3	1.063742	1.295478	-1.255397	-1.144029
4	-0.036772	-1.087038	0.736730	0.096587